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Abstract 


A variety of models have been developed to estimate Disability-Free Life Expectancy (DFLE) 
and related indicators. However, these models typically make no explicit reference to underlying 
disease processes; nor do they allow for much heterogeneity in health status. Other modeling 
efforts have focused on patterns of incidence and progression of major diseases, and on more 
detailed health states. This paper reviews selected models of both types. One objective is to 
compare and contrast the various underlying assumptions in these models. Another is to develop 
a common conceptual framework within which all these various models can be naturally repre- 
sented. This framework provides not only the basis for comparison; it also indicates a possible 
direction for a synthesis of these models that can bring together the construction of summary 
health status indicators, including both DFLE and the more general notion of Population Health 
Expectancy, and explicit models of risk factor, disease and related processes. Finally, the con- 
ceptual framework is readily operationalized by use of monte carlo microsimulation methods, 
and the paper serves to explicate this methodology. 


Keywords: disability-free life expectancy, microsimulation, disease models, population health 
status 
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Introduction! 


There are a variety of methods and models for estimating disability-free life expectancy 
(DFLE). Other models have been developed for specific disease processes. However, the mod- 
els of disease processes generally do not incorporate the capacity to estimate DFLE; and DFLE 
models generally do not consider underlying disease processes. It would clearly be desirable if 
the concept of DFLE could be expanded to consider a spectrum of health states, plus explicit 
representation of the disease processes underlying the onset and progression of disability, as well 
as associated sequalae and health care and other costs. This paper seeks such a synthesis and 
extension of current moddling efforts. It reviews a variety of models pertinent to the estimation 
of DFLE and generalizations thereof, and the explicit representation of disease processes. 


One way to discuss DFLE is to begin with a description of the original Sullivan (1971) 
methodology, and relate it to other modeling efforts. However, developing a consistent mathe- 
matical notation for describing and contrasting the wide variety of approaches would be difficult 
and cumbersome. As an alternative we have chosen to describe each model in terms of two 
broad groups of characteristics. First is the "state space" -- the set of variables used by the model 
to characterize an individual at each age. The second is the "laws of motion" -- the set of formu- 
lae or dynamic processes used to generate individuals’ status at age "a" conditional on their prior 
status. All the models to be reviewed have this basic structure in common; they create a cohort 
of individual life histories or biographies recursively by positing a set of individuals at birth, and 
then synthesizing the individuals’ life courses according too well-defined quantitative descrip- 
tions of the relevant dynamic (and often stochastic) processes, like mortality. 


A number of the models use very simple state spaces, for example including only whether 
the individual was alive or dead, and if alive disabled or not. Others are considerably richer. 
Also, most of the models do not have individuals represented explicitly. Rather they are consid- 
ered only in terms of groups or cells, as in a life table which groups individuals by age and gen- 
der. However, individuals are always implicit, and this is the key to the common conceptual 
framework to be defined and applied in order to summarize the models to be reviewed. 

An idealized biography or longitudinal microdata record for an individual’s lifecycle forms 
the basic structure we shall use to describe each model’s "state space". It must be augmented by 
a description of the set of individuals who form the population covered by the model. This is 
usually an age cohort. The "laws of motion" of the model then give a method for creating an 
individual’s state space at age "a" as a function of his or her state space at earlier ages. The 
approach of defining a general data structure for the state space of DFLE and related models, and 
then explicating the models’ laws of motion, provides a common analytical framework for 
understanding and comparing the models -- one of the goals of this paper. 


1 The third international workshop of the Network on Health Expectancy (Reseau Esperence de 
Vie en Sante, REVES) was held December 3-4, 1990 at Duke University (Durham, NC, USA). 
This workshop focused on "healthy life expectancy: methods of calculation and other method- 
ological considerations", and included discussion of the variety of models of disability-free life 
expectancy (DFLE). One suggestion coming out of the workshop was that a working group be 
established to review these models, plus means of extending them to consider explicit representa- 
tions of chronic disease processes. This paper has served as a point de depart for such a working 


group. 


This approach of developing a common analytical framework is also more than a concep- 
tual exercise -- models described in this form can be implemented in the form of computerized 
longitudinal microsimulation models. Thus, a second objective is to introduce the concept of 
microsimulation to the DFLE discussion. Another objective is to consider generalizations of the 
DFLE concept itself, and similar indicators like healthy life expectancy (HLE) and active life 
expectancy (ALE). These are restrictive in terms of the range of health states considered. Pop- 
ulation Health Expectancy (PHE; Wolfson, 1992) expands on these indicators by building on a 
multidimensional spectrum of health states. Finally, since the analytical framework to be used in 
reviewing various models (plus its implied computer implementation) is a generalization that 
potentially nests the models as special cases, it may serve as the basis for bringing together the 
strong points of the various models. 


The models to be reviewed are by no means exhaustive or representative. Rather the mod- 
els have been selected to illustrate the range of phenomena that might be brought together in 
future developments. We start with a few examples of life table models. Then, because of 
interest in descriptions of chronic disease processes and patho-physiological changes with age, 
the review includes several models of chronic disease, even though these models are not 
designed to produce DFLE or related kinds of measures. We also include a pair of more theoret- 
ical models that deal with unobserved heterogeneity. The last two sets of models represent the 
authors’ current work. Specifically, the models reviewed are: 


° Sullivan (1971) -- single decrement period life table with one morbidity state to compute 
D ° 


? 


° Wilkins and Adams (1983) -- single decrement period life table with multiple disability 
States and subjective "weighting" for degree of impairment to compute DFLE and HLE; 


° Rogers et al. (1989) and Crimmins et al. (1989) -- multi-state increment-decrement period 
life table models to compute DFLE or ALE; 


° Weinstein et al. (1987) -- discrete state-discrete time cell-based CHD morbidity and mor- 
tality process with a limited number of risk states embedded in calendar time to project 
CHD prevalence and mortality; 


° Gunning-Schepers (1988) -- discrete state-discrete time cell-based cause-specific mortality 
model with risk factors embedded in calendar time to project cause-specific mortality con- 
ditional on risk factor prevalence scenarios; 


° Vaupel and Yashin (1985), Vaupel (1988) -- abstract steady-state models assuming various 
forms of latent population heterogeneity to explore mortality risk and intergenerational 
transmission of "frailty"; 


° Manton and Stallard (1991), Dowd and Manton (1991), Manton, Stallard and Woodbury 
(1991) -- stochastic process models with non-stationary risk factor and mortality dynamics, 
heterogeneity of diseases, multivariate relationships between disease and disability types, 
latent “clusters of disabilities, competing mortality risks, and foregone earnings associated 
with premature death to estimate ALE for the U.S. elderly, and to estimate chronic disease 
burdens and mortality based on risk factor scenarios; and 


° Wolfson (1992) -- the POHEM model using monte carlo microsimulation with multivari- 
ate nuptiality and labour market dynamics, unobservable heterogeneity, multivariate risk 
factor dynamics, disease-specific morbidity and mortality processes, treatment techniques, 
functional limitations and health status to estimate steady-state PHE (or DFLE as a special 
case) as well as disease- or risk factor-deleted PHE and health care costs. 


General Framework 


Our review starts with a general data structure based upon a set of longitudinal microdata 
records or "biographies" for a population of individuals, shown in Figure 1. Each "biography" is 
the realization, in a rectangular data array, of underlying health and socio-demographic pro- 
cesses. The horizontal dimension is age (or time). The record stops at death. The vertical 
dimension spans the individual’s "state space" -- the set of discrete or continuous variables that 
describe the individual such that at each age the future health state of the person can be predicted 
with only current information. The state space may include any of the following variables (all 
except genetic endowmenf measured over time). Most of the models being reviewed are 
restricted to a very small subset: 


[Figure 1 About Here] 
° genetic endowment -- genetic disorders and predispositions; 


° environmental exposures -- attributes of the physico-chemical and socio-cultural environ- 
ments that are the sources of noxious exposures, and affect the ability to cope with func- 
tional limitations or disabilities (Minaire, 1990); 


° socio-economic status -- years of schooling and educational attainment, labor force partici- 
pation, earnings, other income, wealth, tenure, career stage, marital status, fertility, etc; 


° other risk factors -- both physical (e.g. cholesterol, obesity, hypertension) and lifestyle (e.g. 
smoking, substance abuse); 


° clinical symptoms and diseases -- CHD (coronary heart disease), cancer, musculoskeletal 
diseases, dementias, etc; 


° functional status / disability -- activities of daily living (ADL), instrumental activities of 
daily living (ADL), multidimensional functional status indicators such as the International 
Classification of Impairments, Disabilities and Handicaps (World Health Organization, 
1980), multi-attribute health status scales (e.g. Ontario, 1990), etc.; 


° health service utilization -- physician visits, home health services, hospital visits, surgery, 
drugs, nursing home use, etc. (measured in natural heterogeneous units, e.g. procedures by 
type, bed-days) ~° . 

° summary economic costs -- a single dollar amount for health services consumed plus other 
economic costs like foregone earnings; and 


° summary health status index -- a health status value score, e.g. based on "Quality Adjusted 
Life Year" (QALY) values or multi-attribute utility or value (e.g. Torrence, 1987; Wil- 
liams et. al., 1990; but note Loomes and McKenzie, 1990). 


These variables in an individual’s state space all refer to a specific time when an individual 
is age "a". However, many processes depend on cumulative exposure or experience; for example 
lung cancer risk functions depend on exposure to tobacco smoking which is often measured in 
_ cumulative pack-years. When dynamics or the laws of motion for a given process depend on an 
extended history or a series of lagged variables, and not just on the variables at age: a-1 , the 
process is not first order Markov, and the state space can be characterized as deficient (Man- 
ton, Stallard and Woodbury, 1991). Such deficiency can be remedied by explicitly including 
time-weighted average or cumulative variables as additional elements in the state space. For 
example, a row could be added in Figure 1 so that the state space included cumulative pack years 
as one variable. This approach allows higher order stochastic processes to be represented as a 
first order Markov process. Alternatively and more compactly, the state space shown in Figure | 


could be left unchanged, and such variables included implicitly. All that is required is recogniz- 
ing that arbitrary functions defined over the trajectories of variables up to age a are also effec- 
tively part of the individual’s state space. 


Figure 1 shows there may be other relevant individuals, e.g., a spouse or child. These indi- 
viduals effectively define the third dimension, the population of which the individual is a mem- 
ber. The year-to-year evolution of these other individuals, particularly "significant others", may | 
interact with that of the first individual -- marriage and fertility are demographic examples; 
availability of a spouse to assist a disabled individual in activities of daily living is another. In 
many life table models, the population or set of persons covered by the analysis is a synthetic 
birth cohort. These life table models embody the assumption that the behavior of individuals is 
completely independent. This is a strong assumption which is not necessary in the more general 
framework of Figure 1. Interdependencies can be embodied in the laws of motion for an individ- 
ual by making some of the state variables at age a dependent not only on his or her own bio- 
graphical history up to age a, but also on the biographical histories of "significant others". 


The general data structure shown in Figure 1 is thus effectively an "omnibus" longitudinal 
microdata set. The outputs of interest for all the models listed above could be computed from 
these microdata if we had actual observations on limited subsets of the rows. However, such 
data are not generally available, so the required subset of this data structure is effectively synthe- 
sized by each model. Moreover, in all but the last model to be reviewed, the microdata them- 
selves are never explicitly synthesized; rather the models create totals, averages, variances, or 
other statistical descriptions of partially aggregated groups of individuals. 


Each of the models to be reviewed constructs the experience of a cohort. Each individual 
in the cohort is represented by a subset of the rows in Figure 1. This construction process 
requires initializing the relevant rows in one column of the state array, usually the leftmost. 
Then each successive column may be generated or synthesized according to laws of motion esti- 
mated from exogenous data, and the contents of columns in the array to the left. Note that by 
identifying "columns" in the array, we imply an assumption of discrete tme. Continuous time 
models could be represented by referring to a vertical line at age a instead of a column with non- 
zero width. 


Most of the models do not represent individual biographies explicitly. Those based on life 
tables, for example, have as their finest level of disaggregation a sub-population defined for 
example by age range, gender, and disability group. This aggregate approach greatly simplifies 
computation. The cost is an inability to account for individual heterogeneity. The laws of 
motion being used must be defined as average transition probabilities or differential equations 
applying identically to all members of a group. This cannot accurately represent individual 
behavior. In turn, it could be an important concern when planning health interventions which 
must affect individual-level processes depending in complex ways on medical and socio- 
economic histories. An alternative is synthesizing the biographies of each of the individuals (or 
a representative sample thereof) in the relevant cohort or population. This latter microdata 
approach affords complete flexibility in representing and analyzing individual heterogeneity. 


The framework of Figure 1 provides a general basis for describing the various models to be 
reviewed in two main parts -- the state space and the laws of motion. In addition, it may be use- 
ful to distinguish two aspects of each modeling effort. One is the estimation or statistical infer- 
ence strategy applied to observed data. These empirical strategies result in distilled essences or 
inferred regularities from the data that form the “fact base" or empirical grounding for the model. 
They can range from drawing on conventional and widely used data series like mortality rates, to 
cross-tabular analyses of household surveys for example to determine disability prevalences, to 


highly sophisticated methods for estimating latent multivariate structure such as fuzzy clusters of 
disabilities, to meta-analysis where important segments of the epidemiological literature are 
reviewed to come up with a synthetic description of various disease processes. 


The second aspect of each modeling effort is the method used to draw out the implications 
of the selected empirical observations -- the deductive strategy. Computerized numerical analy- 
ses underlie all the models to be reviewed. Some are based on multi-state life tables; others use 
more general kinds of cell-based structures where a cell represents a group of individuals with a 
range of characteristics in°common. Yet others use systems of differential equations. The 
deductive strategy used in the last model is a specific instance and application of monte carlo 
microsimulation modeling (MSM). 


Monte carlo MSM is a general methodology with applications ranging from high energy 
particle physics to social science (Orcutt et al, 1976), where both applications date from the earli- 
est years of computing. Monte carlo MSM in the more general sense of a deductive framework 
or method for solving complex systems of equations can be applied to models of the form 
represented by the population of individuals in Figure 1. Since this same population can be used 
to represent all the modeis to be reviewed, we can think of solving each of these models using 
monte carlo MSM methods. Thus, the review will consist of the thought experiment of first 
translating the state space and laws of motion of each of the models to be reviewed into the 
framework of Figure 1, and then sketching how each model can then be "solved" using monte 
carlo MSM. The monte carlo MSM method itself will be described in the course of reviewing 
the various models, which are generally in order of increasing complexity. 


Other features of the models’ deductive strategies may also be relevant. For example, 
some models embody steady state assumptions so that the populations generated are hypotheti- 
cal. Others are embedded in calendar time and attempt to make realistic projections. Finally, the 
empirical and deductive strategies may be more or less tightly coupled. For example, simple 
empirical strategies tend to be coupled with simple deductive strategies. An important question 
is the extent to which more complex empirical strategies can be decoupled from the deductive 
analyses with which they are currently associated. Such decoupling would facilitate richer 
combinations of empirica} observation and deductive analysis, a point to which we return at the 
end of the paper. 


Sullivan’s Method 


Sullivan’s (1971) method is the standard basis for estimating DFLE. We shall develop its 
representation in terms of the framework of Figure 1 gradually, because this also serves to intro- 
duce the monte carlo microsimulation method. Sullivan’s DFLE calculation requires an array for 
each individual’s biography with at most three rows; fewer could be used, but the three rows also 
set the stage for subsequent discussion. The first is a row which shows if the individual is or is 
not disabled: the second is a summary health status row; and the third shows if the individual is 
alive or dead. In other words, the Sullivan state space has three attributes for each individual -- 
disabled or not, health status, and alive or dead, 


The "alive or dead" row for a given individual is generated by "birth" at age zero (a degen- 
erate array with zero columns). Columns representing successive years of life are generated 
recursively, i.e., if the individual is alive at age a-l, determine the (sex-specific) mortality rate 
m(a); draw a random number from a uniform distribution over the interval [0, 1]; if the value 
drawn is more than m/(a) then the individual lives and computations are repeated; otherwise the 


individual dies and the array terminates. In other words, the law of motion for the “alive or 
dead" element of the state space is simple a first order markov process based on age-specific 


mortality rates. 


If we repeat this process of generating complete “alive or dead" rows many times, a monte 
carlo sample of (trivially simple) "synthetic" individual life paths or biographies is generated. 
The proportion of the sample alive at each age approximates the survival proportions of a cohort 
in a life table. Summing the proportions surviving in each age interval (after multiplying by the 
number of years expected to be lived in each interval) gives an estimate of life expectancy. 


Given each individual’s “alive or dead" row synthesized to age a, the next step in Sulli- 
van’s method is to append column elements at age a for rows representing disability and sum- 
mary health status. In the Sullivan method, the incidence of disability is independent of prior 
disability status. Thus, at each age a we "look up" the percentage d(a) of the population age a 
who are "disabled" (for whatever definition used, e.g., based on tabulation of a cross-sectional 
survey) and draw a random number from a uniform distribution over the interval [0, 1]. If it is 
less than d(a) then the individual is "disabled" that year (i.e., the row element is set to 0); other- 
wise the disability value at age a is 1 for "healthy". The law of motion for disability is thus sim- 
ple independent random assignment based on observed proportions. 


Finally, the summary health status row element for the individual at age a is calculated in a 
simplistic manner. This is to set health status equal to the disability value -- 1 if alive and 
healthy, 0 otherwise. In other words, the effective health status valuation function treats being 
disabled the same as being dead. Given these annual algorithms, entries for disability and sum- 
mary health status are computed repeatedly and the rows are extended until the individual "dies." 


As in the calculation of life expectancy just described, this process representing disability 
and summary health status over one individual’s lifecycle can be repeated until a large popula- 
tion of synthetic biographies is created. If we average the summary health status vectors and 
compute the sum of this vector of averages, we estimate the Sullivan DFLE. It is cumbersome t« 
translate the life table methods used to compute Sullivan’s DFLE into the state space and laws ot 
motion of Figure 1, but it sets the stage for more complex models. It also highlights certain 
assumptions -- e.g., that mortality is independent of disability, disability is a binary variable (0, 
1) indicating presence or absence, and disability at age a is independent of prior disability. 


The monte carlo microsimulation thought experiment just applied to translate the Sullivan 
method into the structure underlying Figure 1 and then solve it is computationally more expen- 
sive than the life table computations for Sullivan’s DFLE, and is subject to additional error. That 
is, in addition to the statistical variability of the parameters estimated from the observed data 
(e.g. mortality rates, disability prevalences), there is sampling variability arising from the appli- 
cation of monte carlo methods to draw out the implications of those statistical estimates. This 
error can be quantified using non-parametric sample re-use methods. In turn, the error 
information could be used to estimate the necessary sample size in the synthesized cohort to 
assure that statistical estimates of variables or indicators of interest (e.g. DFLE) have the desired 
precision. We cannot, of course, improve the statistical precision beyond that in the original 
parameter estimates. 


Wilkins and Adams 


Wilkins and Adams (1983) relaxed Sullivan’s assumptions about disability states. Mortal- 
ity follows the same process. However, instead of two disability states (yes or no), they defined 
six. As in Sullivan, which disability state a person is in at age a is independent of prior 
disability. It is based on a set of age- (and sex-) specific prevalences derived from cross- 
sectional data. Thus, d(a) represents the cumulative probability distribution across a set of six 
"disability" states. For each disability state, a summary health status score is assigned using the 
following (arbitrary) scoring: no disability = 1.0, short-term impairment = 0.5, chronic minor 
nee ; ee eNOS major restriction = 0.6, chronic severe disability = 0.5, institutionalized 
= 0.4, and dead = 0.0. 


The translation of the Wilkins and Adams method to Figure 1 is straightforward. The gen- 
eration of the "alive or dead" row is identical to that for Sullivan. The disability row is generated 
by draws from the six state disability distribution d(a), and the health status row is computed 
from the disability states by applying the scores just listed. 


Given a large sample of synthetic biographies (or two samples, for males and females), 
several parameters can be estimated. For example, expectation of life without any disability 
(DFLE) is the sum of all the person-years lived in the "no disability" state divided by the sample 
size. Similarly, the expected time spent institutionalized is the sum over all person-years spent in 
the "institutionalized" state divided by the sample size. A simple form of health status-adjusted 
life expectancy or "population health expectancy" (PHE) can be computed as the sum of all the 
health status row elements (i.e. summing across both ages and individuals) divided by the simu- 
lated monte carlo sample size. 


Again, this representation in terms of a Figure 1-style longitudinal microdata set generated 
by monte carlo microsimulation is more complex and computationally intensive than the aggre- 
gate life table methods used by Wilkins and Adams. But it serves to illustrate the basic structure 
and assumptions of their method. 


Increment-Decrement Multi-State Life Tables 


The increment-decrement methods used by Rogers et al. (1989) to analyze the U.S. Longi- 
tudinal Survey on Aging (LSOA) and Crimmins et al. (1989) relax assumptions of the Sullivan 
method. Like Sullivan, due to information restrictions, there is one disability state (and being 
"healthy"). The same summary health status algorithm is used, i.e. a value of 1.0 if alive and 
healthy, 0.0 otherwise. The difference is that transitions among alive/dead and healthy/disabled 
states are not assumed independent. This is represented by first order Markov transition proba- 
bilities that depend on prior disability. If the individual is alive and healthy at age a-J, the trans- 
itions to age a are: no change, impairment, or death. Similarly, if the individual is alive and 
disabled at age a-J, the transitions are: no change, become healthy, or death. The transitions are 
estimated conditionally on the prior State. 


To represent this, assume that an individual’s biography is synthesized up to age a-I (and 
consists of the same three rows as in the two previous examples). The question is how to repre- 
sent these joint laws of motion for mortality and disability as they are applied to add another col- 
umn to the individual’s state array. Empirical observations give us estimates of two sets of 
transition probabilities -- m/(i,a) and d(t,a), the probability of dying, and the probability of 
changing disability level respectively, at age a given disability status 1 at age a-1. In terms of 
monte carlo microsimulation, these life table models are equivalent to first drawing a random 
number from a uniform distribution over the range 0 to 1. If the number is in the [0, m(i,a)] 


interval, the individual dies; if in the [m/(i,a), m(i,a) + d(i,a)] interval, the individual survives but 
changes disability status; if the number is in the [m(i,a) + d(i,a), 1] interval the individual sur- 
vives and remains in the same disability state. 


Draws of random numbers and tests against exogenously estimated probabilities are 
repeated over ages to complete the individual’s synthetic biography; many such biographies are 
generated and averaged. The DFLE statistic is computed as before. The increment-decrement 
method is more realistic in that transitions out of, as well as into, disability are allowed. These 
can depend on disability status. Mortality depends on disability status. However, there is no 
representation of the underlying disease processes or risk factors giving rise to the disabilities; 
and being disabled is weighted the same as being dead. 


All three types of life table models reviewed so far created a synthetic birth cohort with a 
stationary age and disability structure. The underlying laws of motion are simple discrete state- 
discrete time zero- or first-order Markov transition models whose probabilities are estimated in a 
recent period for survivors of multiple birth cohorts at different ages. This steady state analytical 
strategy abstracts from changes in age structure. These results, like conventional period life 
expectancy estimates, are somewhat analogous to "first derivatives" of society’s current position 
-- an indication of where we are heading if current instantaneous rates remained unchanged. 
However, the results bear no relation to calendar time, either history or prospect; they do not 
describe period changes or cohort trends. As well, the first two models are based on a melange 
of period stocks (disability or institutionalization prevalences) and period flows (mortality rates). 


Weinstein et al. and Coronary Heart Disease 


Weinstein et al. (1987) have developed a model focused on coronary heart disease (CHD). 
Their model describes the disease process for an actual population (instead of a synthetic birth 
cohort). This population consists of a sequence of birth cohorts embedded in calendar time. It 
explicitly incorporates risk factors for CHD morbidity and mortality as 4vell as health care inter- 
ventions. The objectives are to project CHD morbidity and mortality -- not to estimate DFLE or 
related health status statistics. The model is relevant to this analysis because it includes a detailed 
quantitative description of a disease process, a feature that should be incorporated in future gen- 
eralizations and extensions of the DFLE literature. f 


As with the models just reviewed, we shall describe how the Weinstein et al. CHD model 
can be recast as a synthetic longitudinal microdata sample in the framework of Figure 1. The 
Weinstein et al. model cross classifies the population by age, gender, discrete risk factor level, 
and disease history thereby defining about 500 cells or categories, i.e. it is a cell-based discrete 
time-discrete state model. The model’s four risk factors (total serum cholesterol, diastolic blood 
pressure, smoking, and obesity) each have three levels. This can be represented by utilizing four 
rows in the risk factor part of the array in Figure 1. The "laws of motion" used to generate each 
individual’s quadrivariate risk profile by age are applied according to the following process: at 
the beginning of each five-year age interval, draw a risk factor level randomly and independently 
from each of four univariate age- and sex-specific distributions (in turn described by two ar three 
categorical means) observed in a recent survey of the U.S. population (the NHANES), and hold 
it fixed for the next four years. Thus at the individual level, conditional on age and sex, risk 
factors are assumed independent. 


__ The probability of a CHD event at each age depends on whether or not the individual had a 
prior event. If there is no prior event, the probability of the first event is based on a hazard esti- 
mated from a second data set (the 18 year follow-up of the Framingham Heart study) which 
includes age, sex, and the three levels of each of the four risk factors. Given a first event, 
another set of age-specific probabilities determine which type of event it is (myocardial infarc- 
tion, cardiac arrest, angina, or a combination). If there has been a previous CHD event, then the 
probability of another event depends on age and the events that have already occurred (but not on 
risk factors). In Figure 1, there would be a single CHD disease row which showed if events 
Sei at a given age, and if so, how many, and what kind (e.g. one MI, or angina and an 
arrest). 


CHD mortality in the Weinstein et al. model depends on age, sex, whether this is a first or 
subsequent event, and the kind of event. Mortality from other causes depends only on age and 
sex. Health care utilization (e.g., coronary bypass graft surgery) depends only on the type of 
event. It has no subsequent effect on morbidity or mortality. 


We shall now sketch how the Weinstein et al. model would look if recast in the framework 
of Figure 1. Since this model operates in "calendar" time, the process for creating the underlying 
implicit sample of synthetic biographies developed for the life table models has to be modified. 
Essentially, a series of overlapping birth cohorts has to be generated. First, we create a cohort 
aged 80-84 in 1990 (say) by initializing a sample of individuals each with a column of informa- 
tion at age 75-79, and then following each individual in this birth cohort sample (i.e., adding col- 
umns to their state arrays) until each dies. (Even though the laws of motion in this model use 
rates that are the same for five or ten year age groups, the simulations use an annual time step.) 
The mortality rates for causes other than CHD are projected and have both time and age sub- 
scripts. We then create a cohort aged 75-79 in 1990 and follow it until everyone dies. This is 
continued to the cohort aged 30-34 in 1990. Finally, we create 30-34 year old cohorts starting in 
1995, 2000, etc. to complete the set of birth cohorts. The size of each cohort corresponds to its 
current (or projected) population size as appropriate. These birth cohorts are "aligned" so that 
calendar years correspond. We average across them as before. 


We thus have a synthetic longitudinal microdata set starting in 1990 for a representative 
population, with new cohorts added in future years. By cross-tabulating features of the individu- 
als we can compute, for example, the time profile from 1990 to 2010 of CHD mortality, and the 
prevalence of CHD morbidity, by age and sex, exactly corresponding to the projections produced 
by the original Weinstein et al. model. 


Given this description, the assumptions and possible extensions are readily identified. For 
example, non-CHD mortality is independent of risk factors, though we would expect smokers to 
be at higher risk of lung cancer death and respiratory diseases. This could be embodied in the 
laws of motion generating the state space represented in Figure 1 by requiring the probability of 
non-CHD death to be conditional on the smoking row in the array up to age a-/, and not just on a 
simple age- (and sex- and’calendar time-) specific CHD cause-deleted residual mortality rate. 


The effects of risk factors on many causes of mortality including CHD may not be contem- 
poraneous. Cumulative exposure can be significant. In Figure 1, this could be embodied by 
allowing the probability of a CHD event at age a to be a function of the array of risk factor levels 
over the full range from age 0 to age a. Given the framework of Figure 1, extending the Weins- 
tein et al. model along these lines is direct (see Wolfson and Birkett, 1989). However, such 
extensions are extremely cumbersome if not infeasible within the semi-aggregate cell-based _ 
structure of the original model. Adding new kinds of conditionality would require multiplicative 


increases in the number of cells (i.e. the number of states), and would quickly render the 
approach impractical. (It could also generate difficult data requirements as highly improbable 
but logically possible cells were defined.) 


Gunning-Schepers and PREVENT 


The PREVENT model (Gunning-Schepers, 1988) is another cell-based model embedded in 
calendar time. It is used to project cause-specific mortality, conditional on various interventions 
to reduce the prevalence of risk factors such as smoking. One of its key features is the modeling 
of lags in the effects of interventions, and interactions with demographic trends. 


PREVENT includes multiple risk factors (smoking, hypertension, serum cholesterol, obe- 
sity, maternal age at first birth). Mortality from eight causes (ischemic heart disease, cerebrovas- 
cular disease, lung cancer, breast cancer, colon cancer, stomach cancer, traffic accidents, "other") 
is analyzed, though there is no representation of morbidity, i.e. disease onset, progression, and 
case fatality. Work is underway to extend the PREVENT model to incorporate morbidity. 


The PREVENT model can be translated into the monte carlo microsimulation framework 
of Figure 1 in a way exactly analogous to that used for the Weinstein et al. model. 


Both PREVENT and the Weinstein et al. models use a different empirical strategy than the 
life table models. Because of the breadth and complexity of the processes they are seeking to 
model, no single data set offers a sufficient basis. Instead, the laws of motion have been 
assembled by an extensive review and synthesis of the medical and epidemiological literature, 
including reference to expert opinion of clinicians. As a corollary, the various processes have 
very limited interdependence. For example, in Weinstein et al, whether a first CHD event is a 
myocardial infarct depends only on age and sex, and not on other information available in the 
model such as risk factor history. Similarly, in the PREVENT model, mortality risks take no 
account of co-morbidity. 


This kind of independence is understandable, given the fragmentary nature of the clinical 
and epidemiological literature which forms the empirical basis of the models. However, the 
combinatoric problems inherent in cell-based models makes it very difficult to incorporate inter- 
dependencies. Thus, the deductive strategy of cell-based models may impose constraints on the 
utilization of empirical results, particularly where doing so would result in the unmanageable 
complexity of an explosion in the number of cells or cross-classified categories. In contrast, the 
monte carlo microsimulation approach imposes no such constraints. It has the capacity to absorb 
all the variety of current and potential empirical strategies, subject only to the constraints that the 
resulting laws of motion are coherent and can be expressed algorithmically. 


Vaupel and Yashin 


One of the limitations of aggregate or cell-based models is their capacity to incorporate 
individual heterogeneity. Often, this heterogeneity is unobserved -- for example genetic predis- 
position to disease. The expression of these genetic predispositions depends in complex ways on 
the cumulative exposure to environmental and behavioral factors, and is naturally highly varied. 
At the same time, it is important to allow the DFLE literature to expand to incorporate explicit 
descriptions of disease processes, and these in turn may involve characterizing otherwise unob- 
served or latent heterogeneity. For this reason, we review two models which focus on this phe- 
nomenon. 


The first model (Vaupel and Yashin, 1985) gives abstract examples of the implications of 
unobserved heterogeneity. The model is relatively simple with just one parameter -- the propor- 
tion of the cohort endowed at birth with the innate (and unobserved) characteristic "X", and two 
sets of mortality hazards -- one each for those with and without "X". Vaupel and Yashin then x0) 
on to demonstrate a surprisingly rich variety of behaviors for the mortality hazards of the popula- 
tion as a whole, using mixtures of simple hazard functions for the two sub-populations. 


Another model with unobserved heterogeneity, this time a continuous distribution of inher- 
ited frailty, is given in Vaupel (1988). The outcome of interest in this analysis is the "correla- 
tion" of parents’ and children’s life expectancy. In Vaupel’s model, each individual is endowed 
at birth with a frailty, z, drawn from a continuous distribution. Subsequent mortality follows a 
Gompertz trajectory scaled by the (unobservable, innate and immutable) frailty level z. The 
analysis then derives correlations and other measures of association between parents’ (P) and 
children’s (C) longevity in relation to various assumed correlations of the two continuous distrib- 
utions P(z) and C(z), such as perfect heritability. 


As with the models described previously, these two models can be recast in the monte 
carlo microsimulation framework of Figure 1. In the case of the Vaupel and Yashin (1985) 
model with discrete frailty, the translation could go as follows: As each individual in a synthetic 
birth cohort is "born", a random number is drawn to determine whether or not the individual is 
endowed with "X". This information is recorded in a biographical entry at age zero labelled "ge- 
netic endowment". At each subsequent age, as in all the earlier models, a random number is 
selected to see if the individual dies. However in this case, the (age- and sex-specific) mortality 
rate applied is also conditional on whether or not s/he is endowed with "X"; in other words, the 
applicable mortality rate is a function of (discrete) age a and (discrete) unobserved and fixed-at- 
birth genetic frailty X. Then, by running these simulations for various pairs of hypothetical mor- 
tality hazard functions (one for those with "X", one for those without), exactly the same results 
could be obtained -- albeit with much more computation and a bit of monte carlo noise. 


Similarly, Vaupel’s (1988) model with continuous inherited frailty can be recast in the 
framework of Figure 1. The translation of this model is very similar; two processes are required 
-- genetic inheritance and mortality. One difference is that at each age, the mortality rate applied 
is a function of a continuous (rather than a discrete) frailty score, z. The main question is how to 
represent the relationships between parents and children. One approach is simply to generate 
individuals in pairs -- the first denoted P for parent, and the second denoted C for child. Then, 
under Vaupel’s working assumption of perfect heritability of frailty, both P and C are endowed 
at birth with identical values of z; (drawn at random from P(z) = C(z)). The rest of their biogra- 
phies are then synthesized independently by consulting the mortality function given z, but with 
independent random number draws to determine whether or not each dies in each age interval. 
This process is repeated until a sufficiently large sample of P-C pairs is generated. The relevant 
statistic in each case is computed, namely life length; then the correlation of P’s and C’s life 
lengths are computed, as well as statistics such as the probability of surviving past a given age 
conditional on frailty levels. Again, this thought experiment demonstrates that the Vaupel and 
Yashin (1985) and Vaupel (1988) models can be nested in the framework of Figure 1. 


Of course the assumption that mortality is independent in Vaupel’s (1988) model once 
genetic correlations have been taken into account is unrealistic, since most parents and children 
share common environments. The ability to identify the source of the correlation of life expec- 
tancy (i.e., genetic versus environmental) is difficult as is evident from the well known natu- 
re/nurture controversy about intelligence. In any case, different and more realistic patterns of 
heritability -- and correlations of frailty derived from environmental as well as genetic sources -- 
can be represented not only by drawing the "genetic" frailty levels of P and C from a joint distn- 


bution with less than perfect correlation, but also by allowing mortality rates to be conditional on 
environmental factors acting at various ages. Of course, this theoretical possibility does not 
reduce the very large practical difficulties of explicating and quantifying such environmental 
influences. 


The cost in computational complexity of these translations to the Figure 1 framework is 
considerable. It is almost certainly higher than the costs of the analytic methods used by Vaupel 
and Yashin for the stylized theoretical models they consider. However, the cost-benefit of the 
alternative methods shifts as the complexity of the models increases. Trying to assess the impact 
of more realistic models of the effects of heterogeneity such as including both parents’ frailty in 
the model of inheritance, or explicit mixtures of genetic and environmental sources of frailty, 
would require complexities that may take them beyond the realm of analytically tractable meth- 
ods such as those used by Vaupel and Yashin. In the context of complex models of fertility with 
unobserved heterogeneity, Heckman and Walker (1987) found monte carlo microsimulation to 
be more effective than analytical methods. On the other hand, as long as analytic methods 
remain feasible, costs are generally lower and results are more general. 


Multivariate Stochastic Process Models 


In this section we consider three related multivariate stochastic process models. Empiri- 
cally, these models are based on complex analyses of longitudinal microdata, including the use 
of time varying covariates, fuzzy set representations, and multivariate stochastic processes. 


The first example (Manton and Stallard, 1991) is similar to Wilkins and Adams (1983) in 
that it provides an additive decomposition of life expectancy for the elderly population. The 
analysis employs a novel procedure, GoM (grade of membership), for dealing with high dimen- 
sional discrete response data sets. Such data are characteristic of disability surveys which ask 
many questions, for example activities of daily living, where there are typically a handful of 
discrete responses possible for each question. These surveys clearly show that especially for the 
older old, disability is multidimensional. In order to reflect this in a generalization of DFLE, the 
Manton and Stallard model seeks a small and reasonable set of disability states, and then esti- 
mates the portions of their lifetimes individuals can expect to spend in each of these states. 


GoM analysis is applied to the National Long Term Care Survey (NLTCS). This empirical 
analysis filters the effects of errors in variables and reveals latent structure in the pattern of 
responses in the form of clusters of disability "syndromes". Each syndrome consists of a 
weighted collection of the 27 ADL, IADL and functional limitation measures included in the sur- 
vey. The balance of the elderly (medicare) population is classified as being either a survey non- 
respondent or institutionalized. However, individuals are not characterized by being in only one 
or another of these syndromes. Rather, any one individual is characterized as a weighted average 
of all of the syndromes, using a vector of GoM scores which sum to one. It is in this sense that 
there is "fuzzy set membership". The latent structure revealed by the GeM analysis from the 
survey data, based on 6 clusters of disability, suggests one active, three moderately impaired, and 
two heavily impaired states, to which are added two more pure groups, survey non-respondents 
and the institutionalized, so that the entire population is covered. 


Since each individu in the sample now has some weighted average membership in each 
of the disability syndrom. he distribution of these memberships for the entire population can 
be computed by age ands _ The resulting cross-sectional disability prevalences (where disabil- 
ity is defined in terms of partial membership in each of the first 6 clusters, or pure membership in 


one or other of the last 2 states) are then combined with standard life expectancy estimates as in 
the Sullivan and Wilkins and Adams models in order to estimate life expectancy free of disabil- 
ity, and in each of the 8 disability clusters. 


As in our review of the other models, a portion of the Manton and Stallard (1991) analysis 
can be recast in the individual biographical framework in Figure 1, and the quantitative results 
could be reproduced using monte carlo microsimulation methodology. The basic empirical 
ingredients are the GoM results from the NLTCS, and a series of all-cause mortality rates. The 
idea, as with the previous models, is to begin constructing one individual’s biography. S/he 
starts life at age 65, and subsequent survival is determined using all-cause mortality rates as for 
the Sullivan method. This process effectively creates the entries in an alive or dead row of the 
microdata record in Figure 1. Then at each age where the individual is alive, and conditional on 
age and sex, the individual is assigned a level of functional impairment represented by an 8-tuple 
of GoM scores drawn at random from the estimated joint distribution. In other words, eight row 
elements of the functional status portion of the array in Figure 1 at age a are assigned the GoM 
8-tuple scores. This process is repeated at each subsequent age with independent draws of 8-tu- 
ple GoM scores until the individual dies. 


This process of generating an individual biography is then repeated over individuals until a 
sufficiently large sample (say N) of synthetic biographies is generated. We now have an 8 by N 
by A (the maximum age) array of GoM scores over a representative, albeit synthetic, population 
age cohort. (Implicitly, we also have via the lambdas from the GoM analysis a 29 by N by A 
array of disabilities as measured in the survey). If we average over the N synthetic biographies, 
and sum over the A years, we then have an estimate of the 8-tuple of life expectancies in each of 
the eight disability states. As long as N is large, this procedure will exactly reproduce the Man- 
ton and Stallard results. 


Manton and Stallard (1991) extend their analysis of functional status in order to examine 
"disease-deleted" life expectancy in the various disability clusters identified by the GoM analy- 
sis. They did this by estifmating the relationship between the 8-tuples of GoM scores and data on 
28 different diseases also collected in the NLTCS for each survey respondent. The GoM scores 
were estimated as 8 age-dependent functions of dummy variables indicating the presence of each 
of 28 diseases (i.e., 8 x 28 coefficients which themselves vary as a function of age). 


In the context of our translation of the Manton and Stallard model into the monte carlo 
microsimulation framework, their results on disease-deleted life expectancy in the various dis- 
ability clusters could be reproduced as follows: Instead of drawing a GoM 8-tuple at each age 
for each individual, draw a 28-tuple of diseases (each element zero or one according to whether 
or not the disease is present) from the distribution observed in the NLTCS. Next, compute the 
GoM 8-tuple by applying the appropriate regression equation (depending on age and sex) to the 
28 disease dummy variables. However, before applying this last step, set the i-th disease dummy 
to zero irrespective of the value drawn. We then effectively impute to the individual a GoM dis- 
ability based on the i-th disease being eliminated. Again this overall process is repeated to create 
a synthetic population of size N. Then appropriate tabulation of the resulting 8 by N by A array 
will again reproduce the Manton and Stallard results. This kind of disease-deleted analysis 1s 
informative since it provides a connection between diseases and their consequential impacts on 
disabilities in a multivariate manner, and using a readily accessible form of indicator -- namely a 


form of life expectancy. 


The second model, Dowd and Manton (1991), provides forecasts of cause-specific mortal- 
ity based on risk factor prevalences at a starting age (30), the dynamic evolution of these multi- 
variate risk factors, and a series of cause-specific mortality rates dependent on risk factors. Five 
risk factors are considered (systolic and diastolic blood pressure, body mass index, total serum 
cholesterol, and cigarette consumption), and three causes of mortality (cancer, CVD, and all 
other causes). There is no explicit morbidity or disability. (Morbidity transitions can be 
included in the model; see Tolley and Manton, 1991. Also, the numberof risk factors has been 
extended to 12 with morbidity explicitly represented; see Manton et al., 1991b.) Estimates of 
labor force productivity are incorporated to estimate the indirect costs or benefits in terms of 
earnings in a country where risk factor distributions among the population are deteriorating or 
improving. 

Again the model can be described by translating it to the framework in Figure 1. We 
assume that an individual is "born" at age 30 already endowed (at age 29, say) with a 5-tuple of 
risk factor levels drawn at random from the observed continuously distributed multivariate cross- 
sectional prevalences, based on country-specific data on the age- and sex-specific risk factor 
means and variances, as in the original model. The first step in synthesizing each subsequent 
year of the individual’s life (i.e. adding a new column to the left of the array in Figure 1) is to 
assign a 5-tuple of risk factor levels for the new year. This process is recursive, represented by a 
linear first-order autoregressive system based on analysis of 20 years of Framingham data. The 
monte carlo simulation to approximate this process draws a 5-tuple of random numbers from the 
distribution of residuals in the estimated autoregressive equation. Then these residuals are added 
to a linear function of the individual’s 5-tuple of risk factors at age a-]. The result is the individ- 
ual’s risk profile in the current year. 


The second step is to determine whether the individual survives the year. First, the individ- 
ual’s mortality risk for each of the three causes is computed based on the individual’s age, sex, 
and contemporaneous 5-tuple of risk factor levels. The mortality is described as a Gompertz 
function of age times a cause-specific constant times a quadratic function of the (time varying) 
risk factors. Finally, as a function of age (and country), a dollar level of annual wages is 
assigned. 


It should be noted that there is a problem of selective attrition. Those with high levels of 
risk factors are more likely to die, and hence less likely to be present at later ages. This phenom- 
enon is taken into account in the Dowd and Manton analysis by some rather complex adjust- 
ments to the risk factor dynamics equation coefficients, which were originally estimated without 
any consideration of mortality. For purposes of the monte carlo microsimulation version of this 
model, those adjustments are not needed. The reason, simply, is that the analysis is cast at the 
individual rather than the population level so the selective attrition occurs naturally. The joint 
risk factor prevalence distribution that would result from aggregating a synthetically generated 
sample at a given age would automatically reflect the higher mortality rates of those individuals 
at higher levels of risk. 


Again as in our discussions of the other methods, the Dowd and Manton overall results 
could be approximated by repeatedly generating synthetic biographies according to the process 
just described for a large sample say of size N. Samples would have to be generated for each of 
the six countries analyzed, and for each of the three risk factor scenarios considered (baseline, 
high risk, and ideal) for a total: 18 samples. For any one of these synthetic cohort samples, 
median ages at death forthe ent _ cohort and for those dying of each specific cause can be com- 
puted in the usual way. A presen: discounted value of lifetime earnings can readily be computed 
for cen individual in the sample, and averaged over the N individuals, exactly as in the original 
model. 


_ A third example is given in Manton et al. (1991) which combines properties of the pre- 
vious two examples. The first example above (Manton and Stallard, 1991) used GoM scores to 
describe individuals’ fuzzy membership in 8 disability clusters. This analysis uses essentially the 
same 8 disability clusters. However, instead of assuming (implicitly) that disability status is 
independent from one year of age to the next, this analysis uses the GoM estimation methodol- 
ogy to describe the evolution of the 8-tuple GoM scores as a first order auto-regressive process. 
In terms of Figure 1, this means that an individual’s 8-tuple of disability cluster membership at 
age a is not drawn at random from the observed distribution. Rather, an 8-tuple would be drawn 
from the distribution of residuals in the estimated auto-regressive equation. These residuals are 
then added to an age-specific linear function of his or her 8-tuple of GoM scores in the immedi- 
ately preceding year. 


The next part of the Manton et al. (1991) model uses these GoM scores to condition mor- 
tality. This is similar to the Dowd and Manton (1991) model in that mortality is based on a mul- 
tiplicative adjustment to a Gompertz function. However, instead of adjusting the basic 
(disease-specific) Gompertz mortality rate by a function of the individual’s contemporaneous 
risk factor 5-tuple, this analysis multiplies an overall Gompertz mortality function by a quadratic 
function of the individual’s current GoM scores on the disability clusters. 


An important source of complexity in this model is the interaction between disability and 
mortality. The model is considerably more general than the multi-state life table models 
reviewed earlier, because disability is multidimensional. At the same time, if some disability 
States have higher mortality, then this will affect the evolution at the population level of the joint 
distribution of disabilities, in this analysis represented by the distribution of the 8-tuple of GoM 
scores. The auto-regressive equation does not include death as one of the states. Thus, a com- 
plex series of adjustments are required in the formal analytical model in order to account for this 
differential attrition from disability states as a result of differential mortality. Exactly analogous 
to the discussion of the Dowd and Manton model above, such adjustments are not required in the 
monte carlo microsimulation translation of the Manton et al. (1991) model just described. Since 
the simulation is at the microdata level rather than at the level of complete populations, the attri- 
tion due to differential mortality occurs naturally. 


These three models clearly incorporate a great deal of multivariate and dynamical richness 
in describing risk factor, disease, disability and mortality processes. The models reflect the strat- 
egy of integrating data from different sources. Again, trade-offs between analytical and simu- 
lation methods can be identified. The analytic methods used in the models are less costly 
computationally than the microsimulation approach. However, maintaining analytic tractability 
incurs some costs. For example, the mathematical adjustments to account for differential mortal- 
ity (by risk factor or disability status) in the last two models involves significant added complex- 
ity plus some degree of approximation. Also, the empirical strategy in the third model builds on 
a first order stochastic process for the multivariate risk factor distribution estimated from the 
Framingham data. Highes order stochastic relationships are also likely to be significant. How- 
ever, incorporating them would add considerable further complication to the analytic form of the 
model, but not much complication to a monte carlo microsimulation implementation. 


POHEM 


The final model covered in this review is that of Wolfson (1992). This model developed 
out of a broader effort to re-orient fundamentally the system of health statistics in Canada (Wolf- 
son, 1991). The demographic part of the model was developed originally in 1983 to support 
analysis of pension reform proposals. The POHEM (population health model) generalization 
was begun in 1988 with the explicit objective of generalizing the DFLE concept to PHE (popula- 
tion health expectancy). The translation of POHEM into the framework of Figure 1 is direct 
because the model is based on this same data structure and uses monte carlo microsimulation 
methods. In other words, POHEM is an actual instance of the general class of microsimulation 
models, as compared to the other models where we have had to sketch, as thought experiments, 
how they could be recast as other instances of microsimulation modeling. 


An overview of POHEM can be provided by indicating the variables in the state space, and 
the nature of the "laws of motion" -- the functions generating the successive columns of the indi- 
vidual arrays. The current version of POHEM creates not just individuals, but male-female 
pairs. This is done in anticipation of a union. As well, children and remarriage partners are 
explicitly included in this extended family structure or "case". The full lifecycle of each case is 
simulated, not just one individual at a time. A case is completed with the death of the last adult 
(and the last child leaving home) before another is commenced. The processes available for 
explicit modeling in the current version of POHEM, as well as some under development -- the 
so-called laws of motion grouped in the same way as the rows in Figure 1, are as follows: 


Environmental Exposures 
Radon -- endowed at birth by drawing from the observed distribution of levels within 
residential dwellings. 
Socio-Economic Status 


Educational Attainment -- endowed at birth by drawing from univariate distributions 
and husband-wife correlations based on census data. 


First Union -- either legal marriage or common law union (CLU); probability at each 
age represented by a multivariate hazard function of age, sex, education, fertil- 
ity (for females), labour force history, CLU history, and pre-ordained marria- 
geability (for observed heterogeneity, Rowe and Wolfson, 1990). 


First Spousal Age Difference -- based on age at marriage, and observed joint distri- 
bution of brides’ and grooms’ ages. 


Fertility -- probability function of age, parity, and marital status. 


Union Dissolution -- either divorce or separation; probability at each age represented 
by a multivariate hazard function of age, duration of marriage, presence of 
children, labour force experience, age at marriage (Rowe and Wolfson, 1990). 


Child Custody -- marital status. 
Child Leaving Home -- probability based on age, sex, and birth order. 
Remarriage -- probability based on age, sex, divorce versus widow(er). 


Second Spousal Age Difference -- based on marrying person’s age at marriage, sex, 
and prior marital status drawn from the observed joint distribution of brides’ 
and grooms’ ages. 


Labour Force. Participation -- probability of entry or exit at each age represented bya 
set of multivariate hazard functions of age, sex, marital status, presence of 
Gtk by age group, educational attainment, and duration in state (Picot, 


Labour Market Earnings -- dollar level each year based on an autoregressive stochas- 
tic process with parameters based on age, sex, and strength of labour force 
attachment (Kennedy, 19XX). 


Risk Factors 


Blood Pressure, Obesity, Smoking and Cholesterol -- quadrivariate joint density at 
age a derived as first order Markov function of quadrivariate joint density at 
age a-1, age, and sex based on analysis of the 1978 Canada Health Survey 
(Gentleman et al., 1989). 


Diseases 


CHD -- exactty as in Weinstein et al. (1987) except that Canadian risk factor distrib- 
utions and treatment protocols are substituted (Wolfson and Birkett, 1989). 


Lung Cancer -- incidence conditional on cumulative radon and tobacco exposure up 
to age a-10; site, type, and stage assigned based on cancer registry distributions 
by age and sex; progression and case fatality conditional on site, type and stage 
based on meta-analysis of clinical literature (Gentleman et al., 1991). 


Breast Cancer -- incidence based on age, parity, age at first birth; progression and 
case fatality based on fourth-order Markov transitions among disease-free, 
localized recurrence, and metastatic states. 


Dementia -- incidence based on age and sex; progression based on duration since 
onset (Forbes and Barham, 1989). 


Arthritis/Rheumatism -- incidence and progression based on combination of the 
1986 post-censal Health and Activity Limitations Survey and expert consensus 
(Tugwell et al., 1992, Chambers et al., 1991) 

Mortality from Other Causes -- based on age, sex, and marital status. 

Functional Status -- under development; 1990 Ontario Health Survey (Ontario, 1990) 
categories are planned (gross motor, dexterity, hearing, seeing, communicating, cog- 
nitive, emotion, pain); an alternative module is available currently that has three 
states (mild, moderate, severe) based on first order Markov transitions among states 
base on the 1986 post-censal Health and Activity Limitations Survey (this module 
ignores risk factors and diseases). 

Health Care Utilization -- Lung Cancer treatments based on Ottawa Civic Hospital 
(Evans, Will, Wolfson and Berthelot 1991). 

Summary Health Status -- Torrence-style multi-attribute value scale is planned (Tor- 
rence, 1987). Currently using arbitrary weights for disease states similar in style to 
Wilkins and Adams (1983) 

These dynamic processes are illustrated using computer graphics and animation in the Health 


Information Template (Wolfson, 1992b). POHEM has been applied to analyses of DFLE using 
the HALS data, lung cancer (Evans et al, 1991), and CHD (W olfson and Birkett, 1989; Wolfson, 


1992c). 


PROSPECTS AND POSSIBILITIES 


This review has covered a number of population health models related either to the estma- 
tion of concepts like DFLE or to quantitative analysis of disease and other health-related pro- 
cesses. These two strands of analysis have potential synergies, but are only recently being 
brought together. The DFLE models tend to omit explicit morbidity processes and have highly 
simplified descriptions of disability dynamics. The disease models, on the other hand, tend not 
to be designed to compute DFLE statistics. 


However, the elements in this range of models suggest that the ingredients, in at least rudi- 
mentary form, are all available for more complete and integrated models. These elements 
include models of basic demographic and socio-economic dynamics, latent heterogeneity, risk 
factor dynamics, environmental influences, disease incidence and progression, the impacts of 
diseases or groups of diseases on functional status -- and on health care utilization, and finally 
mappings from multivariate disease and functional limitation states to a summary measure of 
health status. Thus we conclude that integrated models are feasible and that many of the 
required building blocks exist. 


What are the benefits of such integrated and more highly multivariate models? One is the 
same as the very simple life table-based DFLE models -- namely the esfimation of an important 
health indicator that begins to go beyond mortality to take account of thé prevalence of morbidity 
in a population. We need measures that allow us to assess progress not only in "adding years to 
life" but also "life to years", more sensitive measures like population health expectancy (PHE). 
The second major reason for more integrated models is to address the biases and inaccuracies 
resulting from the strong simplifying assumptions in the basic Sullivan method -- for example 
that disability is independent of prior disability history or medical history or risk factors, let 
alone socio-economic and environmental factors. 


A third major reason is that having included such a range of factors explicitly in the calcu- 
lation of a reasonable population-based health outcome indicator such as PHE, we have in effect 
created a numeraire for comparing the importance of a wide range of health determinants -—- 
ranging from acute health care to health promotion interventions. With such integrated methods, 
we could estimate not only conventional measures like cause-deleted life expectancy, but also 
"risk factor-deleted" or “intervention-added" PHE. This would allow the relative importance of 
various health problems to be assessed in terms of their burden of morbidity, and not just their 
impact on mortality. In turn this might have the salutary effect of facilitating a reallocation of 
health-related resources away from acute end-stage interventions toward efforts to reduce more 
prevalent but less fatal chronic conditions, or to prevention. 


Moreover, if a monte carlo microsimulation methodology used to recast the various mod- 
els, it operates at the level of individuals rather than aggregated groups. This in turn allows anal- 
ysis and estimation of distributional results. It provides a framework for analyzing health 
inequalities. It also provides an explicit microdata foundation for a summary index like PHE. 
Thus, families of sub-indices are readily constructed. The full sample df underlying individual 
synthesized life paths is also available. These microdata provide a very powerful basis for vali- 
dation and assessing the plausibility of the simulations. They also provide an efficient basis for 
deriving a broad and open range of statistics defined over the sample of life paths, for example 
distributions of durations in various combinations of health and demographic states. 
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An important concern is the minimum level of complexity for such models. On the one 
hand, highly complex models tend to be used only by their developers. An advantage of the 
Sullivan-style life table methods is that their simplicity allows them to be easily replicated by 
groups in many countries. Unfortunately the results from such models require very strong 
assumptions. The question thus is whether a richer methodological framework can be developed 
that on the one hand can feadily absorb and capture more realistic descriptions of the various 
dynamic processes, and on the other hand is sufficiently accessible that researchers in many sites 
can use it. 


In this regard, macroeconometric models may provide a useful example. Each country has 
its own time series data on investment, consumption, interest rates, etc. However, there exist 
standard software packages for estimating relationships and, more importantly for our purposes, 
for bringing these quantitative descriptions of dynamic processes together in an integrated model 
-- typically expressed as systems of difference equations. The data structure of Figure 1 plus 
monte carlo microsimulation methodology may provide such a common framework. It appears 
from this review that the method nests the deductive portions of virtually all the models dis- 
cussed. For simpler models it is not the most effective method -- either in terms of computa- 
tional cost or complexity; but it has the advantage of offering a consistent framework for 
continuing incremental upgrades and increases in sophistication. 


One possibility is to build on the POHEM model. Work is currently underway at Statistics 
Canada to develop a generalized version of the software that will be portable among PCs and 
workstations and allow data from different countries to be entered. This generalized software is 
based on the monte carlo microsimulation methodology, and can be organized to provide a hier- 
archy of models covering a range of complexity. For example, at the simpler end of the spec- 
trum, most of the modules could be "turned off" leaving only the mortality module based on age- 
and sex-specific mortality rates and a disability module based on age- and sex-specific 
prevalence rates. This would replicate the Sullivan method. 


As a next stage of sophistication, alternative mortality and disability modules would be 
available where mortality rates were assumed to be dependent on disability; and disability trans- 
itions conditional on prior disability status and a disaggregation into (say) mild, moderate and 
severe levels. This stage of complexity would provide a method to nest the Wilkins and Adams 
(1983) method with multiple kinds of disability plus the Rogers et al. (1990) and Crimmins et al. 
(1990) increment-decrement life table methods. Ideally, implementing this level of sophistica- 
tion in modeling would require longitudinal survey data. However, many countries could derive 
rough estimates given occasional cross-sectional disability prevalence surveys, provided the 
disability questions were accompanied by some recall questions on the duration of disability. An 
upgraded version of POHEM could provide a "turn-key" software framework that would also 
help to standardize the resulting DFLE or PHE estimates to improve international comparability. 


Beyond this stage, substantial collaboration would be both necessary and desirable. For 
example, POHEM already includes all of the underlying process descriptions of the Weinstein et 
al. CHD model. A working assumption might be that disease processes are sufficiently general 
that data from a wide variety of populations could be drawn upon, as has been done by Gunning- 
Schepers (1988) in her PREVENT model. On the other hand, risk factor prevalences and dyna- 
mics are more likely unique to specific populations, as are health care utilization rates, and 
techniques and unit costs of health care, so that various countries or research groups would have 
to develop their own indigenous data. 
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Areas where substantial work is required are the mappings from disease states to disability 
states, and then from disability states to a summary measure of health status. In the case of the 
first mapping, Grade of Membership (GoM) is a promising method for capturing the required 
multivariate many-to-many relationships. This would require the inclusion of fuzzy state projec- 
tion algorithms in future versions of POHEM. 


A major research effort has been initiated by the Ontario Ministry of Health and Torrence 
et al. at McMaster University that might serve as a foundation for multi-attribute summary health 
status measures. Alternatively, the work of Williams et al. (1990) on the EuroQol project could 
be drawn upon. Either could serve as the basis for the second mapping required -- from multiva- 
riate disability states to a summary health status value. 


In summary, the prospect is feasible for a substantial augmentation of the DFLE style of 
summary health indicator to take explicit account of disease processes, their risk factor precur- 
sors, and their disability, summary health status, and health care cost sequalae. This augmenta- 
tion could build upon a common software framework like POHEM, the international range of 
epidemiological research, the most sophisticated microanalytic methods of estimating hazard 
functions, and the latest GoM style multivariate fuzzy state analysis. This kind of approach 
offers major potential benefits both in integrating diverse strands of research, and in providing 
the basis for a significantly broadened range of internationally comparable health-related indica- 
tors. 
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